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A  Neural  Network  Solution  for  Fixed-Final  Time  Optimal  Control  of 

Nonlinear  Systems 

Tao  Cheng,  Frank  L.  Lewis,  Fellow,  IEEE,  and  Murad  Abu-Khalaf 


Abstract — We  consider  the  use  of  neural  networks  and 
Hamilton-Jacobi-Bellman  equations  towards  obtaining 
fixed-final  time  optimal  control  laws  in  the  input  nonlinear 
systems.  The  method  is  based  on  Kronecker  matrix  methods 
along  with  neural  network  approximation  over  a  compact  set 
to  solve  a  time-varying  Hamilton-Jacobi-Bellman  equation. 
The  result  is  a  neural  network  feedback  controller  that  has 
time-varying  coefficients  found  by  a  priori  offline  tuning. 
Convergence  results  are  shown.  The  results  of  this  paper  are 
demonstrated  on  two  examples. 

Keywords:  Finite-horizon  optimal  control, 

Hamilton-Jacobi-Bellman,  Neural  Network  control 

I.  Introduction 

The  solution  of  the  Hamilton-Jacobi-Bellman  (HJB) 
equation  resulting  in  finite -time  optimal  control  laws 
for  nonlinear  systems  is  a  challenging  problem.  It  is  known 
that  this  optimization  problem  [16],  requires  solving  a 
time-varying  HJB  equation  that  is  hard  to  solve  in  most 
cases.  Approximate  HJB  solutions  have  been  confronted 
using  many  techniques  such  as  those  developed  by  Saridis 
and  Lee  [27],  Beard  et.  A1  [4] [5][6],  Beard,  Bertsekas  and 
Tsitsiklis  [7],  Munos  et.  al  [22],  Kim,  Lewis  and  Dawson 
[14],  Liu  and  Balakrishnan  [17],  Lyshevski  and  Meyer  [20] 
and  Lyshevski  [18][19].  Huang  and  Lin  [13]  provided  a 
Taylor  series  expansion  of  the  HJI  equation  which  is  closely 
related  to  the  HJB  equation. 

Successful  neural  networks  (NN)  controllers  not  based 
on  optimal  techniques  have  been  reported  in  Chen  and  Liu 
[8],  Lewis,  Jagannathan  and  Yesildirek  [15],  Polycarpou 
[24],  Rovithakis  and  Christodoulou  [25],  Sanner  and  Slotine 
[26],  Ge  [11].  It  has  been  shown  that  NN  can  effectively 
extend  adaptive  control  techniques  to  nonlinearly 
parameterized  systems.  NN  applications  to  an  optimal 
control  via  the  HJB  equation  were  first  proposed  by  Werbos 
[21].  Parisini  and  Zoppoli  [23]  used  NN  to  derive  optimal 
control  laws  for  discrete-time  stochastic  nonlinear  systems. 

In  this  paper,  we  use  NN  to  approximately  solve  the 
time-varying  HJB  equation  employing  a  nonquadratic 
functional.  It  is  shown  that  using  a  NN  approach,  one  can 
simply  transform  the  problem  into  solving  an  ordinary 
differential  equation  (ODE)  equation  backwards  in  time. 
The  coefficients  of  this  ODE  are  obtained  by  the  weighted 
residuals  method. 

Motivated  by  the  important  results  in  [4],  we  are  able 
to  approximately  solve  the  time-varying  HJB  equation 
without  policy  iteration  using  the  so-called  GHJB  equation 
followed  by  control  law  updates.  We  accomplish  this  by 
using  a  neural  network  approximation  for  the  value  function 
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which  is  based  on  a  universal  basis  set. 

II.  Problem  Statement 

Consider  an  affine  in  the  control  nonlinear  dynamical 
system  of  the  form 

x  =  f  (x)  +  g(x)u(t) ,  (1) 

where  x  e  91"  ,  /(x)e91",  g(x)e91"x"!  and  the  input 
u(t)e  Rm  .  The  dynamics  f(x)  and  g(x)  are  assumed  to 
be  known.  Assume  that  /  +  git  is  Lipschitz  continuous  on 

a  set  £2  c  91"  containing  the  origin,  and  that  system  (1)  is 
stabilizable  in  the  sense  that  there  exists  a  continuous 
control  on  £2  that  asymptotically  stabilizes  the  system.  It 
is  desired  to  find  the  control  u  that  minimizes  a 
generalized  nonquadratic  functional 

V(x(t0),t0)  =  <p(x(t /),!/)+[  '\Q(x)  +  W(u)\lt  (2) 

with  Q{x) ,  W(u)  positive  definite  on  £2,  i.e.  Vx^O, 
x  e  £2  ,  Q(x )  >  0  and  x  =  0  =>  Q(x)  =  0  .  A  common 
choice  for  W (n)  =  uT Ru  ,  where  R>  0.  The  final  time 
t  f  is  fixed. 


Definition  1.  Admissible  Controls. 

A  control  u  is  defined  to  be  admissible  with  respect  to  (2) 
on  £2  ,  denoted  by  u  e  ¥(£2) ,  if  U  is  continuous  on  £2  , 
w(  0)  =  0,  u  stabilizes  (1)  on  £2 ,  and  Vx0e£2,  V(x0) 
is  finite.  ■ 


Definition  2.  Sobolev  Space. 


H'n'p(L 2)  :  Let  £2  be  an  open  set  in  91"  and  let 
we  Cm(£2).  Define  a  norm  on  u  by 


This  is  the  Sobolev  norm  in  which  the 
Lebesgue.  The  completion  of  ueCfo): 


I  <  p  <  oo  . 

integration  is 

W  <  oo 

II  Wm,p 


with  respect  to  ||m  is  the  Sobolev  space  Hm'p(Cl).  For 

p  =  2  ,  the  Sobolev  space  is  a  Hilbert  space.  ■ 

The  convergence  proofs  of  the  least-squares  method 
are  done  in  the  Sobolev  function  space  H 1,2  (£2)  setting 
[2],  since  we  require  to  prove  the  convergence  of  both 
VL  (x)  and  its  gradient. 

An  infinitesimal  equivalent  to  (2)  is  [16] 


This  is  a  time-varying  partial  differential  equation.  It  is  in 
fact  a  Lyapunov  equation  that  yields  the  value  V  for  any 
given  U  and  is  solved  backward  in  time  from  t  =  t  f  .  By 


setting  t0  =tf  in  (2)  its  boundary  condition  is  seen  to  be 

V(x(tfltf)=</>(x(tf\tf).  (4) 

According  to  Bellman’s  optimality  principle  [16],  the 
optimal  cost  is  given  by 


dV 

— —  -  min 

dt  u(t) 


L  + 


'*v"T 


dx 


f 


which  yields  the  optimal  control. 

1  t  dV 

2  ”  ~dT’ 


(5) 


(6) 


where  V*(x)  is  the  optimal  value  function.  Substituting  (6) 
into  (5)  yields  the  well-known  time -varying  HJB  equation 
[16] 

dV*  dV* 


HJB' 


(V), 


]_dv 

4’ 


dt  dx 

-gR~lgT  dV 


-f  +  Q 


(7) 


=  0 


dx  dx 

This  equation  and  (6)  provide  the  solution  to  fixed-final 
time  optimal  control  for  general  nonlinear  systems. 
However,  this  equation  is  generally  impossible  to  solve. 


III.  Nonlinear  Fixed-Final-Time  HJB  Solution  byNN 
Least-Squares  Approximation 

The  HJB  equation  (7)  is  difficult  to  solve  for  the  cost 
function  V(x) .  In  this  section,  neural  networks  are  used  to 
solve  approximately  for  the  value  function  in  (7)  over  Q. 
by  approximating  the  cost  function  V(x) .  The  result  is  an 
efficient,  practical,  and  computationally  tractable  solution 
algorithm  to  find  nearly  optimal  state  feedback  controllers 
for  nonlinear  systems. 

A  NN Approximation  of  V(x) 

It  is  well  known  that  NN  can  be  used  to  approximate 
smooth  functions  on  prescribed  compact  sets  (Hornik  [12]). 
Since  the  analysis  required  here  is  restricted  to  the  region  of 
asymptotically  stable  (RAS)  of  some  initial  stabilizing 
controller,  NN  are  natural  for  this  application.  We  use  the 
following  equation  to  approximate  V 

VL  M  =  X  wiai  M  =  M  ’  (8) 

7=1 

which  is  a  NN  with  activation  functions  <Tj(: x)e  C1^), 
cr/(o)=0.  The  NN  weights  are  vv.- (?)  and  L  is  the 
number  of  hidden-layer  neurons. 

aL(x)=  [cr1(.r)cr2(x)...C7i(Y)]  r  is  the  vector  of  activation 
function,  v/L  ( t )  =  [w,  (t)w2  {t).~wL  (t)]r  is  the  vector  of  NN 


weights. 

Since  one  requires  in  (7),  the  NN  weights  are 

selected  to  be  time -varying.  This  is  similar  to  methods  such 
as  assumed  mode  shapes  in  the  study  of  flexible  mechanical 
systems  [3].  However,  here  ^(v)  is  a  NN  activation 
vector,  not  a  set  of  eigenfunctions.  That  is,  the  NN 
approximation  property  significantly  simplifies  the 
specification  of  oL(x) .  For  the  infinite  final  time  case,  the 
NN  weights  are  constant  [1].  The  NN  weights  will  be 
selected  to  minimize  a  residual  error  in  a  least-squares  sense 
over  a  set  of  points  sampled  from  a  compact  set  Q  inside 
the  RAS  of  the  initial  stabilizing  control  [10]. 

Note  that 

^  =  =  VaLTwL(t),  (9) 

dx  dx 

where  VoL  is  the  Jacobian  d°L/dx  ,  and  that 


yL  =  Wi('KW’  (10) 

dt 

Therefore  approximating  V(x)  by  VL(x)  in  the  HJB 
equation  (7)  results  in 

-  w l  (t  Vi  (*) ■ -  w l  (t )VoL  (x)f(x) 

+  tl  (t)oL (x)g(x)R^gT  {x)otl  (x)w ,  (t) ,  (11) 

-  Q(-x)  =  eL  (x) 


or 


HJB\  VL  (x)  =  X  wj (x\ u  =eL{x),  (12) 

,  7=1 

where  eL(x )  is  a  residual  equation  error.  From  (6)  the 


corresponding  control  input  is 

MLW  =  -^_1gr(x)Vcz7’(x)wi(l)-  (13) 

To  find  the  least-squares  solution  for  vv ,  (/) ,  the 
method  of  weighted  residuals  is  used  [10].  The  weight 
derivatives  w L(t)  are  determined  by  projecting  the 


residual  error  onto 


deL  (X) 


dwL{t) 


and  setting  the  result  to 


zero  Vxe  Q  using  the  inner  product,  i.e. 

From  (11)  we  can  get 
deL{x) 


d\v , 


(14) 


(15) 


Therefore  one  obtains 


(-  (t)a L  {x),  o L  {x))a  +{-w[(i)Voi  (x)f{x\ o L  (*)) n 

+  (jwIWv<It  (x)g(x)R  g r  (t)Vo  l  (x)w  L{t\aL  {xjj 
+  (-Q(x\ol(x))q  =0 


So  that 


(16) 


W /.(?)  = 

•(V<Tz,(x)/(x),<U  (x))Q  W  L(t) 

/  /  \  / « -i  . 

+(«44«44)Q  •  4 

\  gT (x)VaTLYf  L(t),a L(x)/ a 

~  (o  l  (4  « i  (4)  Q '  (fi(4  «.  l  (4)  q 

(17) 

with  boundary  condition  v(t j , x) -  <p(x(t  f\t  f) . 


Therefore,  the  NN  weights  are  simply  found  by  integrating 
this  nonlinear  ODE  backwards  in  time. 

Following  two  lemmas  show  that  this  procedure 
provides  a  nearly  optimal  solution  for  the  time-vaiying 
optimal  control  problem  if  time-varying  L  is  selected 
large  enough. 


Lemma  1.  Convergence  of  Approximate  Value  Function. 

If  Q  is  compact,  Q(x)  are  continuous  on  Q.  and  are  in 
the  space  span\<J^f  ,  and  if  the  coefficients  |wy(t)|  are 

uniformly  bounded  for  all  L  ,  then 
|F£  —V\l  (n>  — » 0  as  L  increases. 

Proof.  See  [9],  ■ 


Lemma  2.  Convergence  of  Value  Function  Gradient. 


Under  the  hypothesis  of  Lemma  1, 


->0  as  I  increases. 


Proof.  See  [9].  ■ 

At  this  point  we  have  proven  convergence  in  the  mean 
of  the  approximate  value  function  and  the  value  function 
gradient.  This  demonstrates  convergence  in  the  mean  in 
Sobolev  space  Hhl(Ll). 


Lemma  3.  Admissibility’ of  uL(x). 

If  the  conditions  of  Lemma  1  are  satisfied,  then 
3L0  :L>L0,ulg' F(Q). 

Proof.  Define 

V(x  ,W)=  <p{x{tf\tf)+  f  l'[Q(x)+W(u)]dt . 

J>0 

We  must  show  that  for  L  sufficiently  large,  V(x,uL)<  °° 
when  V(x,u)  <  .  But  <fi(x(t  f- \  t  f  )  depends  continuously 

on  W  ,  i.e.,  small  variations  in  W  result  in  small 
variations  in  <j)  .  Also  since  ||wi(-)||^  ^  can  be  made 

arbitrarily  close  to  ||»(-)||“  ,  V(x,uL )  can  be  made 

arbitrarily  close  to  V(x,u) .  Therefore  for  L  sufficiently 
large,  V(x,uL)<°°  and  hence  uL(x)  is  admissible. 


Lemma  3  shows  that  if  the  number  L  of  hidden  layer 
units  is  large  enough,  the  proposed  solution  method  yields 
an  admissible  control. 


B  Optimal  Algorithm  Based  on  NN  Approximation 


Solving  the  integration  in  (16)  is  expensive  computationally. 
Since  evaluation  of  the  L2  inner  product  over  LI  is 
required.  This  can  be  addressed  using  the  collocation 
method  [10].  The  integrals  can  be  well  approximated  by 
discretization.  A  mesh  of  points  over  the  integration  region 
can  be  introduced  on  Q.  of  size  Ar  .  The  terms  of  (16) 
can  be  rewritten  as  follows 

^  =  kWlx, . ■<u(4xjr>  (18) 

£=l<u  M/Mlx, . <u(4/'(4xj7\  (i9) 


j{vGr.(x)g(x)R  lgT(x)VarXi 

c  =  4 

^  (V<U g(x)R ^  £ 7  r ) I  Xp 

0  =  tWlXi  ...Q(x) |XpJr, 


(20) 

(21) 


where  p  in  xp  represents  the  number  of  points  of  the 
mesh.  Reducing  the  mesh  size,  we  have 

(-Wi(t)oi(4®zW)a 

[at a)-xv L(t)-  Ax 


=  lim  -I 

||Ax:|| — >0 


-w[(t)Voi(x)/(4oiW 


—  lim  — 

Ml-*0 


[at b\  wL(t)-  Ax 


(22) 


(23) 


/iw^(t)VaL(x)g(x)R  A 

\-gT(x)VaJ’wL(t),aL(x)/£2,  (24) 

=  lim  Arw[(t)CwL(t)-Ax 

Ml-*0 


(-e(4«i(4)o=  lim  ~{aT -d\  At.  (25) 

||Av||->o 

This  implies  that  (16)  can  be  converted  to 

i*L{f)=4flTA)lvfLit)ATB  (26) 

+  (ArA)~1ATwrL(t)CwL(t)-(ATAyiATD  ’ 

This  is  a  nonlinear  ODE  that  can  easily  be  integrated 
backwards  using  final  condition  to  find  the 


least-squares  optimal  NN  weights.  Then,  the  nearly  optimal 
value  function  is  given  by 

VL(x,t)  =  y/TL{t)aL(x), 
and  the  nearly  optimal  control  by 

MiW=-^_1gr(T)Vc[(T)wi(l)-  (27) 

Note  that  in  practice,  we  use  a  numerically  efficient 
least-squares  relative  to  solve  (26)  without  matrix  inversion. 


IV.  Illustrative  Example 


2 


State  Trajectory 


We  now  show  the  power  of  our  NN  control  technique  for 
finding  nearly  optimal  fixed-final  time  controllers.  Consider 
the  following  linear  system 
i,  =  2x,  +  3x 9  +m. 

.  1  2  1  .  (28) 
x2  =  5xj  +  6x2  +  2u2 

Define  performance  index 


V(x(t0),t0)  =  —  x(t f  )T s(t f  )x(t j- )+  —  j"  (xT Qx  +  nT Rii)dt  . 
2  2  J  t„ 


Here  Q  and  R  are  chosen  as  identity  matrices.  The 


steady-state  solution  of  the  Riccati  equation  can  be  obtained 
by  solving  the  algebraic  Riccati  equation  (ARE).  The  result 


3.1610  2.8234 
2.8234  3.6777 


Our  algorithm  should  give  the  same 


steady-state  value. 

To  find  a  nearly  optimal  time-varying  controller,  the 
following  smooth  function  is  used  to  approximate  the  value 
function  of  the  system 

V(xx,x2)  =  wxxx  +  w2xxx2  +  w3x2  . 

This  is  a  NN  with  polynomial  activation  functions,  and 
hence  F(o)=0. 


Note  that  if  V  =  xT Px  ,  then  P  = 

In  this  example,  three  neurons  are  chosen  and 
w L[t f)  =  [l0, 10,  O] .  Our  algorithm  was  used  to  determine 

the  nearly  optimal  time-varying  control  law  by  backwards 
integrating  to  solve  (26).  A  least-square  algorithm  was  used 
to  compute  w L(t)  at  each  integration  time.  From  figure  1 


it  is  obvious  that  about  six  seconds  from  A ,  the  weights 

converge  to  the  solution  of  the  algebraic  Riccati  equation. 
The  control  signal  is 

u  =  —^R~lgTPx  .  (29) 

The  states  and  control  signal  are  shown  in  Figures  2  and  3. 
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Fig.  1.  Linear  System  Weights 
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Fig.  2:  State  Trajectory  of  Linear  System 
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Fig.  3:  Optimal  NN  Control  Law 


V.  Conclusion 

We  use  NN  to  approximately  solve  the  time -varying  HJB 
equation.  The  technique  can  be  applied  to  both  linear  and 
nonlinear  systems.  Full  conditions  for  convergence  have 
been  derived.  Simulation  examples  have  been  carried  out  to 
show  the  effectiveness  of  the  proposed  method. 
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